Fusion of visible and infrared images using GE-WA model and VGG-19 network

For the low computational efficiency, the existence of false targets, blurred targets, and halo occluded targets of existing image fusion models, a novel fusion method of visible and infrared images using GE-WA model and VGG-19 network is proposed. First, Laplacian is used to decompose the visible and infrared images into basic images and detail content. Next, a Gaussian estimation function is constructed, and a basic fusion scheme using the GE-WA model is designed to obtain a basic fusion image that eliminates halo of visible image. Then, the pre-trained VGG-19 network and the multi-layer fusion strategy are used to extract the fusion of different depth features of the visible and infrared images, and also obtain the fused detail content with different depth features. Finally, the fusion image is reconstructed by the basic image and detail content after fusion. The experiments show that the comprehensive evaluation FQ of the proposed method is better than other comparison methods, and has better performance in the aspects of image fusion speed, halo elimination of visible image, and image fusion quality, which is more suitable for visible and infrared image fusion in complex environments.


Scientific Reports
| (2023) 13:190 | https://doi.org/10.1038/s41598-023-27391-z www.nature.com/scientificreports/ and overcome the shortcomings of insufficient number of DWT directions, but lack of offset invariance 27 , resulting in the fusion image prone to pseudo Gibbs phenomenon, affecting the visual effect of the image. DTCWT can significantly improve the offset sensitivity and direction selectivity of DWT, but DTCWT and DWT have similar fusion strategies and cannot effectively improve the defects of fusion images 8 . NSCT and NSST have all the advantages of the above TD methods and offset invariance 27 , but similar to other TD methods, the fused image is prone to ghosting in the background and the edge of the salient target 28 , and the image fusion process is time-consuming and low efficiency 21,27 . The fusion strategies based on the SD uses the region division method to select different regions, and uses different fusion rules to achieve the image fusion of different regions 21 . Among them, the pixel fusion method has high efficiency, and can better retain the area information of infrared target and the detailed features of visible target, but SD method only fuses the pixels, it is easy to lose part of the detailed information 8,9 . Sparse dictionary learning aims to learn an over-complete dictionary from a large number of highquality images to obtain an effective sparse representation of the source image 15 , but the most of fusion methods are computationally complex and need to determine the sparse representation model and fusion rules based on prior knowledge, which has great limitations 8,16 . The fusion strategies based on data-driven uses different neural network models to extract deep features of the source images, and realize heterogeneous image fusion by feature selection and feature fusion 8,29 . The main advantages of data-driven are similar to CNN-based classification tasks, which remove the complicated process of manually setting parameters, and make it easier to obtain better fusion results. However, data-driven also has some disadvantages, such as large amount of computation, poor model generalization ability, and complex network design, etc. 9,11 . The above-mentioned image fusion methods can't meet application scenarios with high real-time requirements such as equipment fault diagnosis, night security, and fire monitoring, and have disadvantages such as complex fusion process, loss of detailed information, and poor anti-light interference ability. This paper proposes an image fusion method using GE-WA model and VGG-19 network. Firstly, the source image is decomposed into basic image and detail content by Laplacian operator. Secondly, a fusion model based on GE-WA is designed to realize the basic image fusion. Then, the multi-layer depth features of detail content are extracted by the pretrained VGG-19 network, and feature maps are constructed using different depth features. Meanwhile, a weighted fusion and maximum selection strategy based on detail content is used to fuse the feature maps of different depths to obtain the fused detailed content. Finally, the fused basic image and detailed content are reconstructed.
In summary, the main contributions of this paper are the following three folders. (1) A fusion model based on GE-WA is designed to eliminate the halo of the visible basic image, improve the anti-light interference ability of the fusion method. (2) A weighted fusion strategy based on detail content is designed to fuse the same depth feature maps of visible and infrared detail content, and a maximum selection strategy is used to fuse the fusion feature maps of different depths to obtain the fusion detail content. (3) A novel fusion method of visible and infrared images using GE-WA model and VGG-19 network is proposed, which overcomes the shortcomings of traditional fusion methods such as low fusion efficiency and easy loss of detailed information, and improves anti-interference and robustness in complex environments.
The rest of the paper is organized as follows. In Section "The GE-WA model and pre-trained VGG-19 network", the proposed fusion method is introduced in detail. In Section "The proposed method implementation steps and strategies", we briefly summarize the image fusion process. In Section "Experiments and results analysis", experimental setup, results and analysis are showed. In Section "Conclusions", the conclusions of this paper are presented.

The GE-WA model and pre-trained VGG-19 network
In order to obtain a fusion image that not only contains the temperature distribution of target and the details of the scene, but also suppresses the interfering light source and halo in the background. According to the principle of affine transformation 30 , visible and infrared images collected simultaneously in the same surveillance scene are registered, which the infrared image is used as the reference image. Because Laplacian transform can quickly enhance the detailed information of the image, find the edge and texture features of the image. Therefore, this paper uses Laplace sharpening to obtain the basic image and detail content of the image.
The registered source image I k is decomposed into basic image I b k and detail content I d k (k = 1, it means infrared image. k = 2, it means visible image.) using the Laplacian sharpening 31 , and different fusion strategies were used to fuse I b k and I d k respectively. The I b k represents the approximate component of the image, which can reflect the contour characteristics of the image, and the I d k represents the detail component of the image, which can express detailed information of the image and is also the most sensitive part of human eye recognition and machine vision 9 . Therefore, choosing the fusion rules of I b k and I d k is very important for the quality of fusion image. The visible images of the complex street scene at night (CSSN) and the target of battlefield light curtain obscures (TBLCO) are sharpened by the Laplacian operator, the basic images and detailed contents are shown in Fig. 1.
Basic image fusion strategy using GE-WA model. According to Fig. 1, the halo part of the visible image is concentrated in the basic image, that is, the basic image fusion strategy is to eliminate image halo as the main goal. Through a large number of experiments, in the basic image after Laplace sharpening, the gray of visible image at the halo part is close to a constant, and is significantly larger than the non-halo area, and the gray level changes show a Gaussian distribution. Because the Laplacian operator is a differential operator, it has the characteristics of enhancing the region of grayscale mutation and weakening the region of slowly changing grayscale. www.nature.com/scientificreports/ Therefore, this paper designs a basic image fusion scheme based on the Gaussian Estimation-Weighted Average (GE-WA) model. The saliency coefficient of I b 1 is automatically adjusted with the gray of I b 2 in the proposed model, and the function of saliency coefficient is constructed as Eq. (1).
where P b VIS x, y is the saliency coefficient of I b 2 at coordinates (x, y). I b 2 x, y is the gray of I b 2 at coordinates (x, y). E is the halo constant after the Laplacian sharpening. According to the brightness of I b 2 , the mean of I b 2 is used as the value of E. ε is the adjustment factor, which represents the intensity of change at the critical point of halo and non-halo of the image. σ 2 is the variance of I b 2 . It can be seen from Eq. (1) that P b VIS approaches 0 at the highlight and halo parts of I b 2 , P b VIS gradually decreases with the drop of brightness I b 2 , and the P b VIS is largest at the mean of I b 2 . Therefore, according to the change regular pattern of the saliency coefficient of I b k , the saliency map after eliminating the highlight and halo areas of I b 2 is calculated by Eq. (2).
where A b VIS x, y is the saliency map of I b 2 at coordinates (x, y). In the basic image fusion process, the highlight and halo parts of I b 2 mainly selects the information of I b 1 , and weighted fusion is performed in the non-halo parts to obtain useful information of I b 1 and A b VIS . I b 1 and A b VIS are weighted and fused by Eq. (3), and fusion basic image F b can be expressed as: where F b x, y is the gray of F b at coordinates (x, y). α 1 and α 2 are the weights of I b 2 and A b VIS , respectively. In this paper, in order to preserve the background information as much as possible, Detail content fusion strategy based on pre-trained VGG-19 network. In recent years, image processing technology using deep learning has become one of the research hotspots in the field of visual images.
Since VGG network is developed on the basis of the AlexNet network 32 , it has good generalization ability and can extract the depth features of each layer of the image 33,34 . Therefore, a new fusion strategy of detail content using the pre-trained VGG-19 network model is designed, and its fusion framework is shown in Fig. 2. First, the pre-trained VGG-19 network 35 is used to extract the multi-layer depth features of the detail content I d k , then the weight map of depth feature is constructed from the multi-layer depth features, and finally the fused detail content F d is reconstructed according to the obtained weight map and I d k . The depth feature φ i,m k in the i-th layer is obtained by the VGG-19 network model, and its expression is as follows Because l 1 -norm has the "sparse solution" characteristic of the regularization term, it is more suitable for feature screening, find out the "key" features, and sets the unimportant features to 0. According to Eq. where C i k is the rough saliency map of I d k in the i-th layer. C i k x, y is the saliency of C i k at the coordinates (x, y). The fine saliency map C i k of C i k is calculated by a regional Gaussian operator, which makes the fusion algorithm robust to registration error. C i k is calculated by Eq. (6).
where w(x, y) is the Gaussian operator at coordinates (x, y). The Gaussian operator uses a 2D Gaussian convolution kernel. r is the radius of the regional block. If r is larger, the fusion method has better robustness, but the more detailed information is lost, the area radius r = 1 in this paper.
The initial weight map W i k x, y of C i k in the i-th layer is calculated by soft-max operator, and the calculation Eq. (7) is as follows.
where K is the total number of C i k , in this paper, K = 2. W i k x, y is the value of W i k at coordinates (x, y). The pooling layer in the VGG network uses a sub-sampling method, that is, each pooling operation adjusts the size of the feature map to 1/s times the size of the source image, where s is the step size of the pooling layer. In the VGG-19 network, s = 2. Therefore, the size of the feature map corresponding to the depth of the i-th layer is 1/2 i-1 times the size of the detail content (overlap pooling operation).
The inverse operation of the pooling process is adopted for W i k , and the size of W i k is reconstructed by the up-sampling operator to make it the same size as I d k . Four pairs of weight maps W i k with different depths are obtained by Eq. (8). The maximum selection strategy is adopted to select the maximum value of four groups F i d at the same coordinate, and the fusion detail content F d is obtained through Eq. (10).
Refactoring. The fusion basic image F b and detail content F d are reconstructed by Eq. (11), and the final fusion image F is obtained.
where F x, y is the pixel value of the fusion image F at coordinates (x, y).

The proposed method implementation steps and strategies
The main steps of the visible and infrared images fusion method based on GE-WA model and deep learning framework was proposed are as follows: Step 1: Perform Laplacian sharpening on the source image I k after registration to obtain the basic image I b k and the detail content I d k .
Step 2: According to the gray distribution rule of I b k , design the Gaussian estimation model P b VIS to obtain the significant coefficients of I b k .
Step 3: Construct a basic image fusion strategy based on the GE-WA model, perform halo processing and weighted fusion on I b k to obtain the fusion basic image F b .
Step 4: Use the pre-trained VGG-19 network model to obtain the multi-layer depth features of I d k , utilize l 1norm to calculate the 4-layer rough saliency maps C i k of I d k , and calculate the fine saliency maps C i k of each layer C i k based on the regional Gaussian operator.
Step 5: Calculate the initial weight map W i k of C i k in the i-th layer by the soft-max operator, and use the pooling inverse operation and up-sampling operator to reconstruct the size of W i k to obtain the weight map W i k of the i-th layer.
Step 6: Repeat Step5, obtain 4 pairs of weight maps W i k in turn, and use weighted fusion to obtain the initial fusion detail content F i d , and use the maximum selection strategy to obtain the maximum value of the 4 layers F i d at the same coordinate, and then obtain the fusion detail content F d .
Step 7: Reconstruct the fusion basic image F b and fusion detail content F d to obtain the fusion image F. The framework of the fusion method is shown in Figure 3.

Experiments and results analysis
In order to verify the effectiveness and advancement of the proposed method in different complex lighting environments, a test dataset containing halo images is constructed, and the source image that has strictly registration is experimentally verified. At the same time, the proposed algorithm will be compared with the results of other seven typical fusion methods (WLS 36   , it can be seen that the fusion image fused by the proposed method has rich detailed information, significant targets, and good visual effects. All the comparison methods have different fusion defects, as shown in Fig. 4c, d, f, g, the interference target in the red marked box cannot be suppressed. The details of the image are reduced in Fig. 4e, h, i, the image background details and target information are blurry in Fig. 5c-e, h, i, the halo part appears distortion in Fig. 5f, g. There are obvious shadow areas in Fig. 6e and f, the halo parts can't be eliminated in Fig. 6c and g, the outline of the target is blurred and part of the detailed information is lost in Fig. 6d, h, i. The halo part can't be eliminated in Fig. 7c, d, g, the image are less details and is distorted in Fig. 7e-g. In summary, the fusion image using the Ref. 36 can better retain the target and improve the overall contrast, but can't eliminate the halo phenomenon in the image. The fusion image using the Refs. 38 and Ref. 26   www.nature.com/scientificreports/ distortion, fewer texture features, and blurry edges. The fusion image using the Refs. 37 , 40 and Ref. 17 has the problems that the overall image is dark, the details of the object are blurred, and the visual effect is not good. The fusion image using the Ref. 39 has distortion in the halo part, and the fusion result is not clear. The proposed method can completely eliminate the halo of visible image, and the object in the fusion image is clear and the edge is obvious, which is more in line with human visual characteristics.  where M × N is the image resolution.

has serious
(2) Average Gradient (AG). AG is used to describe the overall spatial activity of the image 42 , reflecting the small detail contrast and texture change ability in the image. The larger the AG value is, the clearer the fused image is. The value R AG of AG is obtained by Eq. (13).
where ∇f x and ∇f y are the first-order difference operators of the image in the horizontal and vertical directions, respectively.
(3) Halation Elimination (HE). HE is used to evaluate the ability of the fusion image to eliminate the interference of halo part, and can more objectively evaluate the subjective effect of the human eye 9 . The larger the value of HE, the better the effect of the fusion method on eliminating halo in visible images. The value R HE of HE is obtained by Eq. (14).
where R SSIM is the index value of structural similarity. SSIM(·) is the operator of structural similarity 43 . 4. Deviation Index (DI). DI represents the relative deviation between the fusion image and the reference image.
The smaller the DI value is, the smaller the relative difference between the two in spectral information 41 , that is, the smaller the effect of the fused image on the halo elimination of the reference image. The value R DI of DI is obtained by Eq. (15). The objective evaluation indicator values of the fusion images corresponding to the four typical source images are obtained by Eqs. (12)(13)(14)(15)(16) and ART, as shown in Table 1.
In the experiment, 20 sets of source images in the database were randomly selected for fusion, and six typical objective evaluation indicators were obtained. At the same time, the average value of each type of objective evaluation indicator is calculated by combining the data of the fused image in Table 1, as shown in Table 2.
According to the data in Table 2, the broken-line graphs of different evaluation indicators are drawn (as shown in Fig. 8), and the proposed method and the seven comparison methods are objectively analysis.
According to the four typical evaluation indicators, FQ and ART in Table 2 and Fig. 8, the RMSE, AG, and HE values of the proposed method are higher, indicating that the proposed method has higher fusion quality than other comparison methods, and obvious advantages in eliminating halo in the visible image, which is consistent with the subjective visual analysis. The DI value of the Ref. 37 , Ref. 40 and Ref. 17 are relatively small, indicating that the three image fusion methods are prone to losing the details of visible and infrared images. The HE value of the Ref. 39 is small, which means that this method is not suitable for image fusion under complex lighting environment. The AG value of the Ref. 38 is the smallest, which means that the fusion image obtained by this method has low overall clarity and poor visual effect. The ART value of Ref. 38 , Ref. 26 and Ref. 39 is larger, namely, the fusion method of multi-scale decomposition is easy to reduce the fusion efficiency. The ART value (15) Table 1. Performance indicators of different algorithms under four types of source images.

Methods
Ref. 36 Ref. 37 Ref. 38 Ref. 26 Ref. 39 Ref. 40 Ref. 17   www.nature.com/scientificreports/ of the proposed method is smaller, which means that the proposed method is more suitable for fields with high real-time requirements. The above analysis shows that the RMSE, AG and HE values of the proposed method are the largest, indicating that the image fusion quality of the proposed method is better than other comparison methods. The DI value of the proposed method is relatively small, which is due to the fact that the fusion image and the source image have a certain degree of distortion when eliminating the halo of the visible image. In addition, the comprehensive evaluation FQ value of the proposed method is the largest, indicating that the proposed method can obtain more detailed information of the fusion image, higher definition and stronger anti-interference ability.

Conclusions
The proposed method uses Laplacian sharpening to realize the rapid separation of general and detailed features in the source image, and obtains the basic image containing halo and contour and the detail content containing the texture and edge features. The basic image fusion strategy based on the GE-WA model is adopted to realize the reliable fusion of the general features of the visible and infrared images, eliminate the halo parts in the visible image, and reduce the redundancy of the background information in the fused image. The pre-trained VGG-19 network and the multi-layer fusion strategy are used to realize the fusion of different depth features of the visible and infrared images, and obtain the detail content after fusion. The fusion image is reconstructed by the basic image and detail content after fusion.
The experimental results show that the performance of the proposed method is similar to the existing methods, and it is better than the comparison methods in eliminating halo. The proposed method using the GE-WA model and VGG-19 network overcomes the shortcomings of traditional methods that can't extract image depth and detail information, can obtain more comprehensive, reliable and rich scene information, and can achieve rapid image fusion in a variety of different scenarios.
The RMSE, AG and HE values of the proposed method are the largest, and the ART value is smaller, indicating that the image fusion quality and fusion efficiency of the proposed method are better than those of the comparison methods. In order to eliminate the halo parts in the visible image, the proposed method distorts the structural information of the fused image and the visible image to a certain extent, resulting in the decrease of the DI value. However, the comprehensive evaluation FQ value of the proposed method is the largest, indicating that the proposed method is more suitable for visible and infrared image fusion in complex environments. In the future, the team will further study more effective multispectral image fusion methods and the corresponding fusion image evaluation indicators.